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Abstract 

Increasing sharing in programs is desirable to compactify the code, 
and to avoid duplication of reduction work at run-time, thereby 
speeding up execution. We show how a maximal degree of sharing 
can be obtained for programs expressed as terms in the lambda cal- 
culus with letrec. We introduce a notion of 'maximal compactness' 
for Aietrec-terms among all terms with the same infinite unfolding. In- 
stead of defined purely syntactically, this notion is based on a graph 
semantics. A| etr ec-terms are interpreted as first-order term graphs so 
that unfolding equivalence between terms is preserved and reflected 
through bisimilarity of the term graph interpretations. Compactness 
of the term graphs can then be compared via functional bisimulation. 

We describe practical and efficient methods for the following 
two problems: transforming a A| etrec -term into a maximally compact 
form; and deciding whether two A| etr ec -terms are unfolding-equiva- 
lent. The transformation of a A| etrec -term L into maximally compact 
form Lo proceeds in three steps: (i) translate L into its term graph 
G = [L] ; (ii) compute the maximally shared form of G as its 
bisimulation collapse Go ; (iii) read back a Ai e trec-term Lo from the 
term graph Go with the property [Lo] = Go- Then Lo represents a 
maximally shared term graph, and it has the same unfolding as L. 

The procedure for deciding whether two given A| etre c-terms L\ 
and L2 are unfolding-equivalent computes their term graph interpre- 
tations [Li] and [L2], and checks whether these are bisimilar. 

For illustration, we also provide a readily usable implementation. 

Categories and Subject Descriptors D.3.3 [Language constructs 
and features]: Recursion; F.3.3 [Studies of Programming Con- 
structs]: Functional constructs 

General Terms functional programming, compiler optimisation 

Keywords Lambda Calculus with letrec; unfolding semantics; 
subterm sharing; maximal sharing; higher-order term graphs 

1. Introduction 

Explicit sharing in pure functional programming languages is typi- 
cally expressed by means of the letrec-construct, which facilitates 
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cyclic definitions. The A-calculus with letrec, A| et rec forms a syntac- 
tic core of these languages, and it can be viewed as their abstraction. 
As such Aietrec is well-suited as a test bed for developing program 
transformations in functional programming languages. This cer- 
tainly holds for the transformation presented here that has a strong 
conceptual motivation, is justified by a form of semantic reasoning, 
and is best described first for an expressive, yet minimal language. 

1.1 Expressing sharing and infinite A-terms 

For the programmer the letrec-construct offers the possibility to 
write a program compactly by utilising subterm sharing, letrec-ex- 
pressions bind subterms to variables; these variables then denote 
occurrences of the respective subterms and can be used anywhere 
inside of the letrec-expression. In this way, instead of repeating 
a subterm multiple times, a single definition can be given that is 
referenced from multiple positions. 

We will denote the construct letrec here by let as in Haskell. 

Example 1.1. Consider the A-term (Xx. x) (Xx. x) with two oc- 
currences of the subterm Arc. x. These occurrences can be shared 
with as result the A| e trec-term (let id = Xx. x in idid). 

As let-bindings permit definitions with cyclic dependencies, 
terms in Aietrec are able to finitely denote infinite A-terms (for short: 
A°°-terms). The A°°-term M represented by a A| etrec -term L can be 
obtained by a typically infinite process in which the let-bindings in 
L are unfolded continually with M as result in the limit. Then we say 
that M is the infinite unfolding of L, or that M is the denotation of L 
in the unfolding semantics, indicated symbolically by M = [L] x°° . 

Example 1.2. For the A| etre c-terms L and P and the A°°-term M: 

L := Xf. let r = f r in r 

P:= Xf. let r = /(/>) in r 

it holds that both L and P (which represent fixed-point combinators) 
have M as their infinite unfolding: [L]a~ = [-P]a°° = M. 

L and P in this example are 'unfolding-equivalent'. Note that L 
represents M in a more compact way than P. It is intuitively clear 
that there is no A| etr ec-term that represents M more compactly than 
L. So L can be called a 'maximally shared form' of P (and of M). 

We address, and efficiently solve, the problems of computing the 
maximally shared form of a A| e trec-term, and of determining whether 
two Aietrec-terms are unfolding-equivalent. Note that these notions 
are based on the static unfolding semantics. We do not consider any 
dynamic semantics based on evaluation by /^-reduction or otherwise. 

1.2 Recognising potential for sharing 

A general risk for compilers of functional programs is "[to construct] 
multiple instances of the same expression, rather than sharing a 
single copy of them. This wastes space because each instance 
occupies separate storage, and it wastes time because the instances 
will be reduced separately. This waste can be arbitrarily large, [. . . ]" 
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([28. p. 243]). Therefore practical compilers increase sharing, and 
do so typically for supercombinator translations of programs (such 
as fully-lazy lambda-lifting). Thereby two goals are addressed: to 
increase sharing based on a syntactical analysis of the 'static' form 
of the program; and to prevent splits into too many supercombinators 
when an anticipation of the program's 'dynamic' behaviour is able 
to conclude that no sharing at run-time will be gained. 

A well-known method for the 'static' part is common subex- 
pression elimination (CSE) |6|. For the 'dynamic' part, a predictive 
syntactic program analysis has been proposed for fine-tuning sharing 
of partial applications in supercombinator translations 1 10 1. 

We focus primarily on the 'static' aspect of introducing sharing. 
We provide a conceptual solution that substantially extends CSE. 
But instead of maximising sharing for a supercombinator translation 
of a program, we carry out the optimisation on the program itself (the 
Aietrec-term). And instead of applying a purely syntactical program 
analysis, we use a term graph semantics for A| e trec-terms. 

1.3 Approach based on a term graph semantics 

We develop a combination of techniques for realising maximal shar- 
ing in A| etrec -terms. For this we proceed in four steps: A| e trec-terms 
are interpreted as higher-order term graphs; the higher-order term 
graphs are implemented as first-order term graphs; maximally com- 
pact versions of such term graphs can be computed by standard 
algorithms; Aietrec-terms that represent compacted term graphs (or 
in fact arbitrary ones) can be retrieved by a 'readback' operation. 
In more detail, the four essential ingredients are the following: 

(1) A semantics for interpreting Aietrec-terms as higher-order 
term graphs, which are first-order term graphs enriched with a 
feature for describing binding and scopes. We call this specific 
kind of higher-order term graphs 'A-ho-term-graphs'. 

The variable binding structure is recorded in this term graph concept 
because it must be respected by any addition of sharing. The term 
graph interpretation adequately represents sharing as expressed by 
a Aietrec-term. It is not injective: a A-ho-term-graph typically is the 
interpretation of various Aietrec-terms. Different degrees of sharing 
as expressed by A| et rec-terms can be compared via the A-ho-term- 
graph interpretations by a sharing preorder, which is defined as the 
existence of a homomorphism (functional bisimulation). 

While comparing higher-order term graphs via this preorder is 
computable in principle, standard algorithms do not apply. There- 
fore efficient solvability of the compactification problem and the 
comparison problem is, from the outset, not guaranteed. For this 
reason we devise a first-order implementation of A-ho-term-graphs: 

(2) An interpretation HT of A-ho-term-graphs into a specific kind 
of first-order term graphs, which we call 'A-term-graphs'. It 
preserves and reflects the sharing preorder. 

HT reduces bisimilarity between A-ho-term-graphs (higher-order) 
to bisimilarity between A-term-graphs (first-order), and facilitates: 

(3) The use of standard methods for checking bisimilarity and for 
computing the bisimulation collapse of A-term-graphs. Via HT 
also the analogous problems for A-ho-term-graphs can be solved. 

Term graphs can be represented as deterministic process graphs 
(labelled transition systems), and even as deterministic finite-state 
automata (DFAs). That is why it is possible to apply efficient algo- 
rithms for state minimisation and language equivalence of DFAs. 
Finally, an operation to return from term graphs to A| e trec-terms: 

(4) A readback function rb from A-term-graphs to A| etrec -terms that, 
for every A-term-graph G, computes a A| etrec -term L from the set 
of Aietrec-terms that have G as their interpretation via [-J^ and 
HT (i.e. a A| etrec -term for which it holds that HT(lLJ n ) = G). 





Figure 1. Component-step build-up of the methods for computing a 
maximally shared form Lq of a A| etr ec-term L (left), deciding unfold- 
ing equivalence of A| etrec -terms L\ and L2 via bisimilarity +± (right). 



A/, let r = f r in r 



A/, let r = f(fr) in r 




Figure 2. Computing a maximally compact version of the term P 
from Ex. |1.2| (right) by using composition of term graph semantics 
[•]t> collapse [|, and readback rb, yielding the term L (left). 



1.4 Methods and their correctness 

On the basis of the concepts above we develop efficient methods for 
introducing maximal sharing, and for checking unfolding equiva- 
lence, of Aietrec-terms, as sketched below. 

In describing these methods, we use the following notation: 

H : class of A-ho-term-graphs, the image of the semantics [-J-^ ; 
T '■ class of A-term-graphs, the image of the interpretation HT ; 
[•]r := HT o |.]^ : first-order term graph semantics for Aietrec-terms; 
|4 : bisimulation collapse on H and T; 
rb : readback mapping from A-term-graphs to Aietrec-terms. 

We obtain the following methods (for illustrations, see Fig. [I}: 

> Maximal sharing: for a given A| e trec-term, a maximally shared 
form can be obtained by collapsing its first-order term graph 
interpretation, and then reading back the collapse: rb o |j o J-J-y- 

> Unfolding equivalence: for given Aietrec-terms L and P, it can 
be decided whether [L]a~ = [-PJa°° by checking whether their 
term graph interpretations [L] 7- and \P\ 7- are bisimilar. 

See Fig. [2] for an illustration of the application of the maximal 
sharing method to the A| e trec-terms L and P from Ex. |1.2| 

The correctness of these methods hinges on the fact that the term 
graph translation and the readback satisfy the following properties: 

(PI) Aietrec-terms L and P have the same infinite unfolding if and 
only if the term graphs \L\j- and \P\t are bisimilar. 

(P2) The class T of A-term-graphs is closed under homomorphism. 

(P3) The readback rb is a right inverse of up to isomorphism 
that is, for all term graphs Ge T it holds: (f-] r o rb)(G) =; G. 
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jVofe: |(P2)| and |(P3)| will be established only for a subclass 7^ a g of T. 
Furthermore, practicality of these methods depends on the property: 

(P4) Translation [-Jj- and readback rb are efficiently computable. 

1.5 Overview of the development 

In the Preliminaries (Section[2) we fix basic notions and notations for 
first-order term graphs. A| e trec-terms and their unfolding semantics 
are defined in Section [3] In Section [4] we develop the concept of 
'A-ho-term-graph', which gives rise to the class H, and the higher- 
order term graph semantics for A| etrec -terms. 

In Section|5]we develop the concept of first-order 'A-term-graph' 
in the class T, and define the interpretation HT of A-ho-term-graphs 
into A-term-graphs as a mapping from H to T. This induces the 
first-order term graph semantics \\x '■= HT ° [•]«, for which we 
also provide a direct inductive definition. 

In Section|6]we define the readback rb with the desired property 
as a function from A-term-graphs to Aietrec-terms. Subsequently in 
Section [7] we report on the complexity of the described methods, 
individually, and in total for the methods described in Subsection |1.4| 

In Section[8]we link to our implementation of the presented meth- 
ods. Finally in Section[9]we explain easy modifications, describe 
possible extensions, and sketch potential practical applications. 

1.6 Applications and scalability 

While our contribution is at first a conceptual one, it holds the 
promise for a number of practical applications: 

• Increasing the efficiency of the execution of programs by trans- 
forming them into their maximally shared form at compile-time. 

• Increasing the efficiency of the execution of programs by repeat- 
edly compactifying the program at run time. 

• Improving systems for recognising program equivalence. 

• Providing feedback to the programmer, along the lines: 'This 
code has identical fragments and can be written more compactly.' 

These and a number of other potential applications are discussed in 
more detail in Section|9] 

The presented methods scale well to larger inputs, due to the 
quadratic bound on their runtime complexity (see Section]?}. 

1.7 Relationship with other concepts of sharing 

The maximal sharing method is targeted at increasing 'static' sharing 
in the sense that a program is transformed at compile time into a 
version with a higher degree of sharing. It is not (at least not a 
priori) a method for 'dynamic' sharing, i.e. for an evaluator that 
maintains a certain degree of sharing at run time, such as graph 
rewrite mechanisms for fully-lazy [31] or optimal evaluation [ 1 ] of 
the A-calculus. However, we envisage run-time collapsing of the 
program's graph interpretation integrated with the evaluator (see 
Section[9|. 

The term 'maximal sharing' stems from work on the ATERM 
library [5 ]. It describes a technique for minimising memory usage 
when representing a set of terms in a first-order term rewrite system 
(TRS). The terms are kept in an aggregate directed acyclic graph 
by which their syntax trees are shared as much as possible. Thereby 
terms are created only if they are entirely new; otherwise they are 
referenced by pointers to roots of sub-dags. Our use of the expression 
'maximal sharing' is inspired by that work, but our results generalise 
that approach in the following ways: 

• Instead of first-order terms we consider terms in a higher-order 
language with the letrec-construct for expressing sharing. 

• Since letrec typically defines cyclic sharing dependencies, we 
interpret terms as cyclic graphs instead of just dags. 

• We are interested in increasing sharing by bisimulation collapse 
instead of by identifying isomorphic sub-dags. 



ATERM only checks for equality of subexpressions. Therefore 
it only introduces horizontal sharing and implements a form of 
common subexpression elimination ( CSE) 1281 p. 241]. Our approach 
is stronger than CSE: while Ex. |l.l| can be handled by CSE, this is 
not the case for Ex. |1.2| In contrast to CSE, our approach increases 
also vertical and twisted sharing^] 

1.8 Contribution of this paper in context 

Blom introduces higher-order term graphs |4 |, which are extensions 
of first-order term graphs by adding a scope function that assigns a 
set of vertices, its scope, to every abstraction vertex. 

In the paper [ 12 1 we introduced, for interpreting A| e trec-terms, a 
modification of Blom's higher-order term graphs (the A-ho-term- 
graphsof the class H) in which scopes are represented by means 
of 'abstraction prefix functions'. We also investigated first-order 
term graphs with scope-delimiter vertices. In particular, we exam- 
ined which specific class of first-order A-term-graphs can faithfully 
represent higher-order A-ho-term-graphs in such a way that compact- 
ification of the latter can be realised through bisimulation collapse 
of the former (this led to the A-term-graphs of the class T). 

Whereas in the paper 1121 we exclusively focused on the graph 
formalisms, and investigated them in their own right, here we con- 
nect the results obtained there to the language A| etrec for expressing 
sharing and cyclicity. Since the methods presented here are based 
on the graph formalisms, and rely on their properties for correctness, 
we recapitulate the concepts and the relevant results in Sec.|4]and|5] 

The translation [-Jj- of A[ e t re c-terms into first-order term graphs 
was inspired by related representations that use scope delimiters to 
indicate end of scopes. Such representations are generalisations of 
a de Bruijn index notation for A-terms 1 8 ] in which the de Bruijn 
indexes are numerals of the form S(. . . (S(0)) . . .). In the gener- 
alised form, due to Patterson and Bird [3], the symbol S can occur 
anywhere between a variable occurrence and its binding abstraction. 
The idea to view S as a scope delimiter was employed by Hendriks 
and van Oostrom, who defined an end-of-scope symbol A 1171 . It 
is also crucial for Lambdascope-graphs (interaction nets) on which 
van Oostrom defines an optimal evaluator for the A-calculus [26|. 

In the report [ 1 I !]. and in the paper 1 13] we used a closely related 
higher-order rewrite system in order to precisely characterise those 
A°°-terms that can be expressed by (in the sense that they arise as 
infinite unfoldings of) finite terms in A| e trec, and respectively, in A^. 

2. Preliminaries 

By N we denote the natural numbers including zero. For words w 
over an alphabet A, the length of w is denoted by \w\. 

Let E be a TRS-signature 1 30 1 with arity function ar : E -> N. A 
term graph over E (or a Yi-term- graph) is a tuple {V, lab, args, r) 
where: V is a set of vertices, lab : V -*■ E the (vertex) label function, 
args : V -*■ V* the argument function that maps every vertex v to 
the word args(v) consisting of the ar(lab(v)) successor vertices 
of v (hence | args(v)\ = ar(lab(v))), and r, the root, is a vertex in 
V. Term graphs may have infinitely many vertices. 

Let G be a term graph over signature E. As useful notation for 
picking out an arbitrary vertex, or the i-th vertex, from among the 
ordered successors of a vertex v in G, we define for each i £ N 
the indexed edge relation >+i £ V x V, and additionally the (not 
indexed) edge relation c V x V, by stipulating for all w, w' e V: 

w >*i w : <=> 3wo, • • ■ , w n 6 V. args(w) = wo-"W n a w = Wi 

w w : <=> 3i € N. w >+i w 

A path in G is described by wo «-fc 1 wi >+k 2 ••• ^*fc„ w n, where 
Wo,wi, . . . , w n € V and n, ki , fe, • • • > k n e N. An access path of a 



2 For definitions of horizontal, vertical, and twisted sharing we refer to (4). 
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vertex w of G is a path that starts at the root of G, ends in w, and 
does not visit any vertex twice. Access paths need not be unique. A 
term graph is root-connected if every vertex has an access path. 

Note: By a 'term graph' we will, from now on, always mean a 
root-connected term graph. 

Let Gi = (Vi, labi, args x , n), G2 = (V2,lab2,args 2 ,r2) be 
term graphs over signature E, in the sequel. 

A bisimulation between Gi and G2 is a relation R £ Vi x V% 
such that the following conditions hold, for all (w, w') € 7?: 



3. Unfolding Semantics of A| et rec-terms 

Informally, we regard X\ str ec-terms as being defined by the following 



(n, r 2 ) e R 
lab\(w) = lab2(w') 
(args^w), args 2 (w')) e R* 



(roots) 1 
(labels) I (1) 
(arguments) J 



where the extension R* c y* x V 2 of R to a relation between words 
over Vi and words over V2 is defined as: 

R* :={(w 1 ---w k ,w' r --w' k ) I 

wt, . . . , w k e Vi, w[, . . . , w' k e V 2 , 

for k e N such that (wi, «;■) e R for all 1 < i < k}. 

We write Gi ±± G2 if there is a bisimulation between Gi and G2, 
and we say, in this case, that Gi and G2 are bisimilar. Bisimilarity 
±± is an equivalence relation on term graphs. 

A functional bisimulation from Gi to G2 is a bisimulation 
that is the graph of a function from Vi to V2. An alternative 
characterisation of this concept is that of homomorphism from Gi 
to G2: a morphism from the structure Gi to the structure G2, that 
is, a function h : Vi -*■ Vi such that, for all v e Vi it holds: 



fr(ri) = r 2 
labi(v) = lab2(h(v)) 
^(args^v)) = args 2 (h(v)) 



(roots) 
(labels) 
(arguments) 



(2) 



where h* is the homomorphic extension h* : Vx -*■ V 2 * , v\~-v n >-* 
h(vi)---h(v n ) of h to words over Vi. We write Gi G2 if there 
is a functional bisimulation (a homomorphism) from Gi to G2. An 
isomorphism between Gi and G2 is a bijective homomorphism 
i : Vi -> V2 from Gi to G2. If there is an isomorphism between Gi 
and G2, we write Gi - G2, and say that G\ and G2 are isomorphic. 

Let / 6 E. An f -homomorphism between Gi and G2 is a 
homomorphism h between Gi and G2 that shares only vertices 
with the label / : h(wi) = h(w2) => Za&i(u>i) = £a&i(ii>2) = / 
holds for all w\,W2 e Vi. If this is the case, we write Gi zir G2. 
An f -bisimulation between Gi and G2 is a bisimulation between 
Gi and G2 such that its restriction to vertices with labels different 
from / is a bijective function. We use *±' to indicate f -bisimilarity. 

The relation =t is a preorder, the sharing preorder on the class 
of term graphs over a given signature E. It induces a partial order 
on the isomorphism equivalence classes of term graphs over E. 

Let G = (V,lab,args,r) be a term graph. A bisimulation 
collapse of G is a maximal element in the class {G' | G z± G'} up 
to ^, that is, a term graph G' 0 with G =i G' 0 such that if G 0 z± Gq 
for some term graph Go , then Go - G' 0 . The canonical bisimulation 
collapse G || of G is defined as the root-connected part of the 'factor 
term graph' G/r of G with respect to the largest bisimulation 7? 
between G and G (the largest 'self-bisimulation' on G), which 
is an equivalence relation on V. The factor term graph G/~ of 
G with respect to an equivalence relation ~ on V is defined as 
G/~ := (V/~, lab/~, argsj~, [r]~) where V/~ is the set of — equiva- 
lence classes of vertices in V, [r]~ is the — equivalence class of r, 
and lab/~ and args/~ are the mappings on V/~ that are induced by 
lab and args, respectively. Every two bisimulation collapses of G 
are isomorphic. This justifies the common abbreviation of saying 
that 'the bisimulation collapse' of G is unique up to isomorphism. 



grammar: 



Xx. L 
LL 

x 

let B'mL 



B 



(fx, 



L, f n - L 

. . , f n 6 TZ all distinct) 



(abstraction) 
(application) 
(variable) 
(letrec) 

(equations) 



Formally, we consider A[ etr ec -terms to be defined correspondingly as 
terms in the formalism of Combinatory Reduction Systems (CRS) 
1301 . CRSs are a higher-order term rewriting framework tailor- 
made for formalising and manipulating expressions in higher-order 
languages (i.e. languages with binding constructs like A-abstractions 
and let-bindings). They provide a sound basis for defining our 
language and for reasoning with letrec-expressions. By formalising 
a system of unfolding rules as a CRS we conveniently externalise 
issues like name capturing and a-renaming, which otherwise would 
have to be handled by a calculus of explicit substitution. Also, we 
can lean on the rewriting theory of CRSs for the proofs. 

As CRS-signature we use EA letr< , c = Ea u {let n , rec-in„ | n s N}, 
with Ea = {abs, app}, where the unary symbol abs and the binary 
symbol app represent A-abstraction and application, respectively; 
the symbols let n of arity one, and rec-in„ of arity n + 1 together 
formalise let-expressions with n bindings. By \L\ we denote the 
size (the number of symbols) of a Ai e trec-term L. By Ter(Aietrec) we 
denote the set of CRS-terms over EA letrec . For readability, we rely on 
the informal first-order notation. 

Infinite \-terms are formalised as iCRS-terms (terms in an 
infinitary CRS 1221 ) over Ea, forming the set Ter(A°°). Informally, 
infinite A-terms are generated co-inductively by the alternatives 
(abstraction), (application), and (variable) of the grammar above. 

In order to formally define the infinite unfolding of Aietrec -terms 
we utilise a CRS whose rewrite rules formalise unfolding steps 1111 . 
Every A| etr ec-term L that represents an infinite A-term M can be 
rewritten by a typically infinite rewrite sequence that converges 
to M in the limit. However, not every A| e trec-term represents an 
A°°-term. For instance the A| etr ec-term Q = \x. let / = / in / x with 
a meaningless let-binding for / does not unfold to a A°°-term. 
Therefore we introduce a constant symbol •, called 'black hole', for 
expressing meaningless bindings, in order to define the unfolding 
operation as a total function. The unfolding semantics of Q will then 
be Xx. * x. So we extend the signature Ea to Ea. including •, and 
denote the set of infinite A-terms over Ea by Ter(A"). Similarly, 
the rules below are defined for terms in Ter(A| et rec,.) based on 
signature ~S\ lf!trec . that extends EA| etrec by the blackhole constant. 

Definition 3.1 (unfolding CRS for A| e trec-terms). The rules: 

(@) let B in L 0 Li (let B in L 0 ) (let B in Li) 
(A) let B in Aa;. L 0 -> Aa;. let B in Lo 
(let_in) let B 0 in let Bi in L -> let B 0 ,Bi in L 
(let-rec) let Bi, f = L, B 2 in / -* let Bi, / = L, B 2 in L 
(gc) let /1 = Li, . . ., /„ = L n in P -* P 

(if /1 , . . . , f n do not occur in P) 
(tighten) let Bi, / = g, B 2 in L 

\etB 1 [f:=g],B 2 [f:=g]\nL[f:=g] 
(where g with g t f a recursion variable in Bi or B2) 
(•) let Bi, / = /, B 2 in L -+ let Bi, / = •, B 2 in L 

define, in informal notation, the unfolding CRS for A| e trec-terms with 
rewrite relation -► UI1 f. Here is the CRS-notation for two of the rules: 
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(A) let„([/]rec-in„(X 1 (/),...,X„(/),abs([x]Z(/,x)))) 

- abs([x] let„([/] rec-in„(X 1 (/), . . . , X n (f), Z(f, x)))) 
(letJn) let„([/]rec-in„(l(/),let m ([g]rec-in m (y(/,g))),X(/,g))) 

- Iet„ +m ([/g] rec-in n+m (X(/),r(/,3), Z(f,g))) 
Example 3.2 (Unfolding derivation of L from Ex. 1 1.2 



A/, let r = / r in r ^^ reC) A/, let r = / r in / r 
A/, (let r = /r in/) (let r = /r in r) 
A/./(letr = /rinr)^ ( :r cc) ... 

We say that a Ai e trec-term L unfolds to an A~-term M, or that L 
expresses M, if there is a (typically) infinite -^ un f-rewrite sequence 
from L that converges to M, symbolically L -*» un f M. Note that 
any such rewrite sequence is strongly convergent (the depth of the 
contracted redexes tends to infinity), because the resulting term does 
not contain any let-expressions. 

Lemma 3.3. Every \\ Bt rec-term unfolds to precisely one AJ -term. 

Proof (Outline). Infinite normal forms of -hurt are A~-terms since: 
every occurrence of a let-expression in a A| e trec,.-term gives rise 
to a redex; and infinite Ai e trec,» -terms without let-expressions are 
A" -terms. Also, outermost-fair rewrite sequences in which the rules 
(tighten) and (•) are applied eagerly are (strongly) convergent. 

Unique infinite normalisation of -> un f follows from finitary 
confluence of -> un f • In previous work [ 1 1 1 we proved confluence for 
the slightly simpler CRS without the final two rules, which together 
introduce black holes in terms with meaningless bindings. That 
confluence proof can be adapted by extending the argumentation to 
deal with the additional critical pairs. □ 

Definition 3.4. The unfolding semantics for A| etr ec-terms is defined 
by the function : Ter(Aietrec) -*■ Ter(X^), where L >-*■ 

JL]a~ := the infinite unfolding of L. 

Remark 3.5 (Regular and strongly regular A°°-terms). A°°-terms that 
arise as infinite unfoldings of A| e t ra c-terms form a proper subclass 
of those A°°-terms that have a regular term structure II II . A°°- terms 
that belong to this subclass are called 'strongly regular', and can be 
characterised by means of a decomposition rewrite system, and as 
those that contain only finite 'binding-capturing chains' II 1 1 1 1 31 . 

4. Lambda higher-order term graphs 

In this section we motivate the use of higher-order term graphs as a 
semantics for A| et rec-terms; we introduce the class H of 'A-ho-term- 
graphs' and define the semantics for interpreting A| e t re c-terms 
as A-ho-term-graphs. Finally, we sketch a proof of the cor rectne ss 
of [[■]•« with respect to unfolding equivalence (the property (PI) i. 



We start out from a natural interpretation of A| etrec -terms as first- 
order term graphs: occurrences of abstraction variables are resolved 
as edges pointing to the corresponding abstraction; occurrences 
of recursion variables as edges to the subgraph belonging to the 
respective binding. We therefore consider term graphs over the 
signature E* = {@,A, 0,»} with arities ar(@) = 2, ar(A) = 1, 
ar(0) = 1, and ar(») = 0. These function symbols represent 
applications, A-abstractions, abstraction variables, and black holes. 

We will later define a subclass of these term graphs that excludes 
meaningless graphs. In line with the choice to regard all terms 
as higher-order terms (thus modulo a-conversion), we consider a 
nameless graph representation, so that a-equivalence of two terms 
can be recognised as their graph interpretations being isomorphic. 

For a term graph G over E^ with set V of vertices we will 
henceforth denote by V(@), V(X), V(0), and V(») the sets of 
application vertices, abstraction vertices, variable vertices, and 
blackhole vertices, that is, those with label @, A, 0, •, respectively. 



Example 4.1 (Natural first-order interpretation). The A| e trec-terms 
Land Pin Ex. [12] can be represented as the term graphs in Fig. [2] 

These two graphs are bisimilar, which suggests that L and P are 
unfolding-equivalent. Moreover, there is a functional bisimulation 
from the larger term graph to the smaller one, indicating that L 
expresses more sharing than P, or in other words: L is more compact. 
Also, there is no smaller term graph that is bisimilar to L and P. We 
conclude that L is a maximally shared form of P. 

However, this translation is incorrect in the sense that bisimilarity 
does not in general guarantee unfolding equivalence, the desired 
property [(FT)] This is witnessed by the following counterexample. 

Example 4.2 (Incorrectness of the natural first-order interpretation). 

Li = let/ = Ax. {\y.fy)xmf 

L = let / = Ax. fx in / 

L-2 = let / = Ax. (Aj/. fx) x in / 

While = and [L] A ~ * [^2] a~ ■ all of their term 

graphs Gi, G, G are bisimilar (please ignore the shading for now): 






G 

Consequently this interpretation lacks the necessary structure for 
correctly modelling compactification via bisimulation collapse. 

We therefore impose additional structure on the term graphs. 
This is indicated by the shading in the picture above, and in the 
graphs throughout this paper. A shaded area depicts the scope of an 
abstraction: it comprises all positions between the abstraction and its 
bound variable occurrences as well as the scope of any abstraction 
on these positions. By this stipulation, scopes are properly nested. 

Now note that the functional bisimulation on the right in the 
picture in Ex. |4.2| does not respect the scopes: The scope of the 
topmost abstraction vertex in the term graph G2 interpreting L2 
contains another A-abstraction; hence the image of this scope under 
the functional bisimulation cannot fit into, and is not contained in, 
the single scope in the term graph G of L. Also, the trivial scope of 
the vacuous abstraction in G2 is not mapped to a scope in G. Thus 
the natural first-order interpretation is incorrect, in the sense that 
functional bisimulation does not preserve scopes on the first-order 
term graphs that are interpretations of A| e trec-terms. 

To prevent that interpretations of not unfolding-equivalent terms 
like L\ and L2 in Ex. |4.2| become bisimilar, we enrich first-order 
term graphs by a formal concept of scope. More precisely, abstrac- 
tion prefixes are added as vertex labels. They also serve the purpose 
of defining the subclass of meaningful term graphs over E^ 1 that 
sensibly represent cyclic A-terms. In the enriched term graphs, each 
vertex v is annotated with a label P(v), the abstraction prefix of v, 
which is a list of vertex names that identifies the abstraction vertices 
in whose scope v resides. Alternatively scopes can be represented by 
a scope function (as in [14]) that assigns to every abstraction vertex 
the set of vertices in its scope. In the article 1 12] we show that higher- 
order term graphs with scope functions correspond bijectively to 
those with abstraction prefix functions. 

Abstraction prefixes can be determined by traversing over the 
graph and recording every binding encountered. When passing an 
abstraction vertex v while descending into the subgraph representing 
the body of the abstraction, one enters or opens the scope of v. This 
is recorded by appending v to the abstraction prefix of v's successor. 
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v is removed from the prefix at positions under which the abstraction 
variable is no longer used, but not before any other variable that 
was added to the prefix in the meantime has itself been removed. 
In other words, the abstraction prefix behaves like a stack. We call 
term graphs for representing A| e trec-terms that are equipped with ab- 
straction-prefixes 'A-higher-order term graphs' (A-ho-term-graphs). 



Example 4.3 (The A-ho-term-graphs of the terms in Ex. |4.2 




oh 



( v a)@ 

(Va)@ \ i- ^X Vh ^0 



(V, 




(V, 



KJq 




The superscripts of abstraction vertices indicate their names. The 
abstraction prefix of a vertex is annotated to its top left. Note that 
abstraction vertices themselves are not included in their own prefix. 

We define A-ho-term-graphs as term graphs over E^ together 
with an abstraction-prefix function that assigns to each vertex an 
abstraction prefix. It has to respect certain correctness conditions re- 
stricting the A-ho-term-graphs to exclude meaningless term graphs. 

Definition 4.4 (correct abstraction-prefix function for term graphs 
over E^). Let G = (V, lab, args, r) be a E^ -term-graph. 

An abstraction-prefix function for G is a function P : V -* V* 
from vertices of G to words of vertices. Such a function is called 
correct if for all w, wq, Wi e V and k e {0, 1} it holds: 

P(r) = e (root) 

P(«) = e (black hole) 

P(w 0 ) < P(w)w (A) 

P(w k ) < P(w) (@) 

W 0 G V(X) 
A P(w 0 )w Q = P(w) 



W € V(A) A W >+o Wo 
W 6 V(@) A W >+k w k 

w e V(0) a w »o Wq 



(0) 



Here and later we denote by < the 'is-prefix-of ' relation. 

Definition 4.5 (A-ho-term-graph). A X-ho-term-graph over E^ is a 
five-tuple Q = (V, lab, args, r, P) where Gg = (V, lab, args, r) is 
a term graph over E. , called the term graph underlying Q, and P 
is a correct abstraction-prefix function for Gg . The class of A-ho- 
term-graphs over E^ is denoted by H. 

Definition 4.6 (homomorphism, bisimulation for A-ho-term-graphs). 

Let Qi = (Vi, labi, args 1 , n, -Pi) and(?2 = (V2, lab?,, args 2 , T2,P?) 
be A-ho-term-graphs over E. . 

A bisimulation between Q\ and Q2 is a relation R £ Vi x V2 that 
is a bisimulation between the term graphs Gg 1 and Gg 2 underlying 
Q\ and Qi, respectively, and for which also the following condition: 

(P 1 (w), P2(w')) £ R* (abstraction-prefix functions) (3) 

(for R* see p.|5]below {T}) is satisfied for all w e Vi and all w' 6 Vi. 
If there is a such bisimulation between Q\ and Q2, then we say that 
Q\ and Q2 are bisimilar, and denote this fact by Q\ t±Q?- 

A homomorphism (a functional bisimulation) from Q\ to Q2 
is a morphism from the structure Q\ to the structure Q2, or more 
explicitly, it is a homomorphism h : Vi -* V2 from Gg i to Gg 2 that 
additionally satisfies, for all w eVi, the following condition: 

h*(P\ (w)) = P2(h(w)) (abstraction-prefix functions) (4) 

where h* is the homomorphic extension of h to words over V\. We 
write Q\ zt G2 if there is a homomorphism from Q\ to C/2- 



4.1 Interpretation of A| et rec-terms as A-ho-term-graphs 

In order to interpret a Aj«trec-term L as A-ho-term-graph, the transla- 
tion rules 1Z from Fig. 3 are applied to a 'translation box' |(*[])L| . 
It contains L furnished with a prefix consisting of a dummy variable 
*, and an empty set [] of binding equations. The translation process 
proceeds by induction on the syntactical structure of the prefixed 
Aietrec-expression's body. Ultimately, a term graph G over E^ is 
produced, together with a correct abstraction-prefix function for G. 

For reading the rules 1Z in Fig.[3]correctly, observe the details as 
described here below. Illustrations of the translation process when 
applied to two A| etr ec-terms used here can be found in Appendix A 
of the extended version 1 15] of this paper. 



• A translation box |(p) L\ contains a prefixed, partially decom- 
posed Aietrec-term L. The prefix contains a vector p of annotated 
A-abstractions that have already been translated and whose scope 
typically extends into L. Every prefix abstraction is annotated 
with a set of binding equations that are defined at its level. There 
is special dummy variable denoted by * at the left of the prefix 
that carries top-level function bindings, i.e. binding equations 
that are not defined under any enclosing A-abstraction. The 
A-rule strips off an abstraction from the body of the expression, 
and pushes the abstraction variable into the prefix, which initially 
contains an empty set of function bindings. 

• Names of abstraction vertices are indicated to the right, and 
abstraction-prefixes to the left of the created vertices. In order to 
refer to the vertices in the prefix we use the following notation: 
vs(p) = v 1 —v n if p = *[B 0 ] ...Xn n [B n ]. 

• Vertices drawn with dashed lines have been created in earlier 
translation steps, and are referenced by edges in the current step. 

• In the S-rule, which takes care of closing scopes, FV(L) stands 
for the set of free variables in L. 

• The let-rule for translating let-expressions creates a box for the 
in-part as well as for each function binding. The translation of 
each of the bindings starts with an indirection vertex. These 
vertices guarantee the well-definedness of the process when it 
translates meaningless bindings such as / = /, or g = h, h = g, 
which would otherwise give rise to loops without vertices. The 
let-rule pushes the function bindings into the abstraction prefix, 
associating each function binding with one of the variables in the 
abstraction prefix. There is some freedom as to which variable 
a function binding is assigned to. This freedom is limited by 
scoping conditions that ensure that the prefixed term is a valid 
CRS-term: function bindings may only depend on variables and 
functions that occur further to the left in the prefix. The chosen 
association also directly determines the prefix lengths used in 
the translation boxes for the function bindings. 

• Indirection vertices are eliminated by an erasure process at the 
end: Every indirection vertex that does not point to itself is 
removed, redirecting all incoming edges to the successor vertex. 
Finally every loop on a single indirection vertex is replaced 
by a black hole vertex that represents a meaningless binding. 
Abstraction prefixes for such black holes are defined to be empty. 

Definition 4.7. We say that a term graph G over E^ and an abstrac- 
tion-prefix function P is 1Z- generated from a A| etre c4erm L if G and 
P are obtained by applying the rules 1Z from Fig. 3 to (*[])L| , 

Proposition 4.8. Let L be a Aietrec-term. Suppose that a term graph 
G over E^, and an abstraction-prefix function P are 7?.-generated 
from L. Then P is a correct abstraction-prefix function for G, and 
consequently, G and P together form a A-ho-term-graph in H. 

There are two sources of non-determinism in this translation: the 
S-rule for shortening prefixes can be applicable at the same time as 
other rules; also the let-rule does not fix the lengths Zi, . . . , Ik of the 
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(p) Xx. L 




(px v [])L 




(p) L 0 Li 




(P)Lo 




(P)Li 





(,>y [..... r i j/ 



( vs{p) v) Jf N w 



(x v 0 "[B 0 ] ... xl"[B n ])x„ 



(t"l-..«n) 




(px"[/r i =L 1 ,...,/^=L„])L 



x ^ FV(L) 
fi{FV(L) 



(P)L 



let: 




■- x v n n [B n ]) 






let B in L 0 



(^...vix^i (asS°[S6] - x^[B' n ])L 0 



{B' i = B i ,{f^=L j \l j =i, l<j<k}) 



I 



(S stands for /i = Li , . . . , / fc = L fe ) 



h,...,h<n such that Vi,j < k : k<lj=>Vf = Le. B[., g = P n B[. : g { FV(L) 
and Vi < fc {j/ 1 y is required variable of /;} c {x 0) . . . , xi % } 



Figure 3. Translation rules 1Z for interpreting A| e t re c-terms as A-ho-term-graphs. See Section|4~T|for explanations. 




Figure 4. Translation of Aa. (A6. Ac. a c) (Ad. a d) with eager 
scope-closure (left), and with lazy scope-closure (right). While in 
the left term graph four vertices can be shared, with as result the 
translation of the term Xa. let / = Ac. a c in (A6. /) /, in the right 
term graph only a single variable occurrence can be shared. 



abstraction prefixes for the translations of the binding equations, but 
admits various choices of prefixes that are shorter than the prefix 
of the left-hand side. Neither kind of non-determinism affects the 
term graph that is produced, but in general several abstraction-prefix 
functions, and thus different A-ho-term-graphs, can be obtained. 

4.2 Interpretation as eager-scope A-ho-term-graphs 

Of the different translations of a A| e trec-term into A-ho-term-graphs 
we are most interested in the one with the shortest possible ab- 
straction prefixes. We say that such a term graph has 'eager scope- 
closure', or that it is 'eager-scope'. The reason for this choice is 
illustrated in Fig. [4] eager-scope closure allows for more sharing. 

Definition 4.9 (eager scope). Let Q = (V, lab, args, r, P) be a 
A-ho-term-graph. Q is called eager-scope if for every w e V with 
P(w) = pv for p 6 V* and v e V, there is a path w = wo » wi ►+ 
■ ■■ w m >+o v in Q from w to v with P(w) < P(wi) for all 
i 6 {1, . . . , m}, and (this follows) w m e V(0) and v e V(A). 

Hence if a A-ho-term-graph is not eager-scope, then it contains 
a vertex w with abstraction-prefix vi . . . v n from which v n is only 
reachable, if at all, by leaving the scope of v n - It can be shown that 



in this case another abstraction-prefix function with shorter prefixes 
exists, and in which v n has been removed from the prefix of w. 

Proposition 4.10 (eager-scope = minimal scope; uniqueness of 
eager-scope A-ho-term-graphs). Let Qi = (V, lab, args, r, Pi) for 
i e {1, 2} be A-ho-term-graphs with the same underlying term graph. 
If Q\ is eager-scope, then |Pi(w)| < |P2(w))| for all w e V. If, in 
addition, also Q2 is eager-scope, then Pi = P2. Hence eager-scope 
A-ho-term-graphs over the same underlying term graph are unique. 

Also, we will call a translation process 'eager-scope' if it resolves 
the non-determinism in 1Z in such a way that it always yields 
eager-scope A-ho-term-graphs. In order to obtain an eager-scope 
translation we have to consider the following aspects. 
Garbage removal. In the presence of garbage, unused function 
bindings, a translation cannot be eager-scope. Consider the term 
Aa;. Xy. let / = x in y. The expendable binding / = x prevents the 
application of the S-rule, and hence the closure of the scope of 
Xx, directly below Aa;. Therefore we will assume that all unused 
function bindings are removed prior to applying the rules 1Z. A 
Aietrec-term without garbage will be called garbage-free. 
Short enough prefix lengths in the \et-rule. For obtaining an eager- 
scope translation, we will usually stipulate that the S-rule is applied 
eagerly, i.e. it is given precedence over the other rules. This is clearly 
necessary for keeping the abstraction prefixes minimal. But how do 
we choose the prefix lengths h, . . . ,1^ in the let-rule? The prefix 
lengths U determine at which position a binding fi = Li is inserted 
into the abstraction prefixes. Therefore k may not be chosen too 
short; otherwise a function / depending on a function g may end 
up to the right of g, and hence may be removed from the prefix by 
the S-rule prematurely, preventing completion of the translation. Yet 
simply choosing k = n may prevent scopes from being minimal. For 
example, when translating the term Aa. A6. let / = a in a a (/ a) b, 
it is crucial to allow shorter prefixes for the binding than for the 
in-part. As shown in Fig.[5]the graph on the left does not have eager 
scope-closure even if the S-rule is applied eagerly. Consequently 
the opportunity for sharing the lower application vertices is lost. 
Required variable analysis. For choosing the prefixes in the let- 
rule correctly, the translation process must know for each function 
binding which A-variables are 'required' on the right-hand side of 
its definition. For this we use an analysis obtaining the required 
variables for positions in a A| e trec-term as employed by algorithms 
for lambda-lifting 171 1211 . The term 'required variables' was coined 
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Figure 5. Translation of Xa. Xb. let / = a in a a (/ a) 6 with equal 
(left) and with minimal prefix lengths (right) in the let-rule. 



by Morazan and Schultz 1241 . A A- variable x is called required at a 
position p in a A| e trec-term L if x is bound by an abstraction above 
p, and has a free occurrence in the complete unfolding of L below p 
(also recursion variables from above p are unfolded). The required 
variables at position p in L can be computed as those A-variables 
with free occurrences that are reachable from p by a downwards 
traversal with the stipulations: on encountering a let-binding the in- 
part is entered; when encountering a recursion variable the traversal 
continues at the right-hand side of the corresponding function 
binding (even if it is defined above p). 

With the result of the required variable analysis at hand, we now 
define properties of the translation process that can guarantee that 
the resulting A-ho-term-graph is eager-scope. 

Definition 4.11 (eager-scope and minimal-prefix generated). Let L 
be a Aietrec-term, and let Q be a A-ho-term-graph. 

We say that Q is eager-scope 7?.-generated from L if Q is ^-ge- 
nerated from L by a translation process with the following property: 
for every translation box reached during the process with label 
(p x v [B]) P, where P is a subterm of L at position q, it holds that 
if x is not a required variable at q in L, then in the next translation 
step performed to this box either one of the rules / or let is applied, 
or the prefix is shortened by the S-rule. 

We say that Q is 7£-generated with minimal prefixes from L 
if G is 7£-generated from L by a translation process in which 
minimal prefix lengths are achieved by giving applications of the 
S-rule precedence over applications of all other rules, and by always 
choosing prefixes minimally in applications of the let-rule. 

Proposition 4.12. Let Q be a A-ho-term-graph that is 7£-generated 
from a garbage-free A| etre c-term L. The following statements hold: 

(i) If G is eager-scope 7£-generated from L, then Q is eager-scope. 

(ii) If Q is 7£-generated with minimal prefixes from L, then Q is 
eager-scope 7£-generated from L, hence by[(i)]5 is eager-scope. 

Definition 4.13. The semantics of A| e trec-terms as A-ho-term- 
graphs is defined as : Ter(Aietrec) -» H, L «■ [L]« := the 
A-ho-term-graph that is 7£-generated with minimal prefixes from a 
garbage-free version L' of L. 

Proposition 4.14. For every A| e trec-term L, \L\ti is eager-scope. 

4.3 Correctness of [•] H with respect to unfolding semantics 

In preparation of establishing the desired property [(Pl)| in Sect.B] we 
formulate, and outline the proof of, the fact that the semantics \\\h 
is correct with respect to the unfolding semantics on Aietrec-terms. 

Theorem 4.15. [Li] A ~ = [L 2 ]a~ if and only if {L^n <± lL 2 }n, 
for all X\ e trec-terms L\ and L 2 . 



Sketch of Proof. Central for the proof are A-ho-term-graphs that 
have tree form and only contain variable backlinks, but no recursive 
backlinks. They form the class Ht % H. Every Q eH has a unique 
'tree unfolding' Tree(G) e Ht- We make use of the following state- 
ments. For all L, Li, L 2 6 Ter(A| etrec ), M,Mi,M 2 s Tter(A~), 



G, Gi , Q2 e H, and Tr, Tr\ , Tr 2 e Ht it can be shown that: 

Li-n,„ f L 2 => [Ii] w tNH (5) 

L -* unf M (hence [L] A ~ = M) => [L]« ±z [Af]« (6) 

[Af]w e Ht (7) 

[AfJ« = [M a ]« => Mi=M 2 (8) 

G ±1 Tree(G) (9) 

Tr 1 t±Tr 2 => Tr!-Tr 2 (10) 

Gi±±G 2 => Tree ( Gi )- Tree (G 2 ) (11) 



Hereby (5} is used for proving {6), and ((5) with |l0} for |TT}. Now 

for proving the theorem, let L\ and L 2 be arbitrary A| etrec -terms. 

"=>": Suppose JLi]a~ = [I/2JA 00 ■ Let M be the infinite unfolding 
of Li and L 2 , i.e., = M = [L2]w- Then by |6} it follows 

[LiJ« t= Win =t [L a ]„, and hence ILr]« ±± p 2 l«. 

"<=": Suppose [LiJ« ±± [L 2 ]h- Then by |TT]( it follows that 
7Vbc([Li]«) = 3h»([LaIw). Let Mi , M 2 e Ter(A~) be 
the infinite unfoldings of L\ and L 2 , i.e. M\ = [LiJa°°, and 
M 2 = JL 2 ]a~ • Then (6]l together with the assumption entails 
[Mi]« ±± [M 2 ]«. Since [MJw, [M 2 ] w £ H T by 0, it 
follows by that [Mi]k = [M a ]«. Finally, by using i&f we 
get Mi = M 2 , and hence [Li] a~ = Mi = M 2 = IL 2 ]a~ . □ 

5. Lambda term graphs 

While modelling sharing expressed by A[ etre c-terms through A-ho- 
term-graphs facilitates comparisons via bisimilarity, it is not im- 
mediately clear how the compactification of A-ho-term-graphs via 
the bisimulation collapse | J, for A-ho-term-graphs (which has to re- 
spect scopes in the form of the abstraction-prefix functions) can be 
computed efficiently. We therefore develop an implementation as 
first-order term graphs, for which standard methods are available. 

As witnessed by Ex. |4.2| the scoping information cannot just be 
discarded, as functional bisimilarity on the underlying term graphs 
does not faithfully implement functional bisimilarity on A-ho-term- 
graphs. Therefore the scoping information has to be incorporated 
in the first-order interpretation, which we accomplish by extending 

with S-vertices, scope delimiters, that signify the end of scopes. 
When translating a A-ho-term-graph into a first-order term graph, 
S-vertices are placed along those edges in the underlying term graph 
at which the abstraction prefix decreases in the A-ho-term-graph. 

Example 5.1 (Adding S-vertic es). Consider the terms in Ex. |4.2| and 
their A-ho-term-graphs in Ex. |4.3| In the first-order interpretation 
below, the shading is just for illustration purposes; it is not part of 
the structure, and does not directly impair functional bisimulation. 




The addition of scope delimiters resolves the problem of Ex. |4T2] 
They adequately represent the scoping information. 
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As for A-ho-term-graphs, we will define correctness conditions 
by means of an abstraction-prefix function. However, the current 
approach with unary delimiter vertices leads to a problem. 

Example 5.2 (S-backlinks). The term graph with scope delimiters 
on the left admits a functional bisimulation that fuses two S-ver- 
tices that close different scopes. We cannot hope to find a unique 
abstraction prefix for the resulting fused S-vertex. This is remedied 
on the right by using a variant representation that requires backlinks 
from each S-vertex to the abstraction vertex whose scope it closes. 
Then S-vertices can only be fused if the corresponding abstractions 
have already been merged. Hence in the presence of S-backlinks, as 
in the right illustration below, only the variable vertex can be shared. 




Therefore we consider term graphs over the extension E s of 
E* with a symbol S of arity 2; one edge targets the successor vertex, 
the other is a backlink. We give correctness conditions, similar as for 
A-ho-term-graphs, and define the arising class of 'A-term-graphs'. 

Definition 5.3 (correct abstraction-prefix function for term graphs 
over E^.). Let G = (V, lab, args, r) be a Eg .-term-graph. 

An abstraction-prefix function P : V -*■ V* on G is called 
correct if for all w, wq, wi e V and k s {0, 1} it holds: 

P(r) = e (root) 

P(.) = e (black hole) 

w 6 V(\) A w y+o Wq => P(w 0 ) = P(w)w (A) 

w e V(@) A w »k w h => P(w k ) = P(w) (@) 

» e nO)A»~o» 0 - {Tp^ Wo = P( W ) ^ 

w eV(S)A W ~ oWo => lPto>»-PM (s)l 
v ' I for some v e V 

T ,._. f Wi 6 V(\) ,_. 

v ' I A PyW\)W\ — Pyw) v ' 

While in A-ho-term-graphs the abstraction prefix can shrink by 
several vertices along an edge (cf. Def. |4.4) , here the situation is 
strictly regulated: the prefix can only shrink by one variable, and 
only along the outgoing edge of a delimiter vertex. 

Proposition 5.4 (uniqueness of the abstraction prefix function). Let 
G be a term graph over the signature Ej .. If Pi and P2 are correct 
abstraction prefix functions of G, then Pi = P2. 

Definition 5.5 (A-term-graph). A X-term-graph is a term graph 
G = (V, lab, args, r) over E s> . that has a correct abstraction-prefix 
function (which is not a part of G). The class of A-term-graphs is T. 

Definition 5.6 (eager scope). A A-term-graph G is called eager- 
scope if together with its abstraction-prefix function it meets the 
condition in Def.|4.9| 7eag denotes the class of eager-scope graphs. 



5.1 Correspondence between A-ho- and A-term-graphs 

The correspondences between A-ho-term-graphs and A-term-graphs: 

HT-H^T TH :T -*H 

are defined as follows: For obtaining HT{Q) for &Q e H, insert 
scope-delimiters wherever the prefix decreases, as illustrated in 
Fig. [6] For obtaining TH(G) for a G e T, retain the abstraction- 
prefix function, and remove every delimiter vertex from G, thereby 
connecting its incoming edge with its outgoing edge. For formal 
definitions and well-definedness of TT-i and T-LT, see 1 12 1. 




Figure 6. Left: definition of HT by inserting S-vertices, between 
edge-connected vertices of a A-ho-term-graph. Right: interpretation 
HTi.G) of the eager-scope A-ho-term-graph Q in Fig.|4] 



Note that a A-ho-term-graph may have multiple corresponding 
A-term-graphs that differ only with respect to their 'degree' of S-sha- 
ring (the extent to which S-vertices occur shared). HT maps to a 
A-term-graph with no sharing of S-vertices at all. 

The proposition below guarantees the usefulness of the transla- 
tion T-LT for implementing functional bisimulation on A-ho-term- 
graphs. In particular, this is due to items |(iii)"| and |(iv)| As formulated 
by item[(I)] TH is a retraction of HT (and HT a section of TH). 
The converse is not the case, yet it holds up to S-sharing by item |(ii)| 
For the proof, we refer to our article fl2l . 

Proposition 5.7 (correspondence with A-ho-term-graphs). 

(i) TH°HT= id-H- 

(ii) (HT o TH) (G)^ S G holds for all G e T . 

(iii) TH and HT preserve and reflect functional bisimulation zt and 
bisimulation ±± on H and T. 

(iv) TH and HT preserve and reflect the property eager-scope. 

(v) T is closed under z± s , ±z 5 , and ±± s . 

(vi) HT and TH induce isomorphisms between H and T/t+s . 

5.2 Closedness of T under functional bisimulation 

While preservation of r± by HT is necessary for its implementation 
via z± on T, the practicality of the interpretation HT also depends 
on the closedness of T under :±, Namely, if the bisimulation collapse 
G = HT{G)\i of the interpretation of some Q e H were not 
contained in T, then the converse interpretation TH could not 
be applied to G in order to obtain the bisimulation collapse of Q. 

A subclass K. of the term graphs over a signature E is called 
closed under functional bisimulation if, for all term graphs G, G' 
over E, whenever G e K, and G z±G' , then also G' e K,. 

Note that for obtaining this property the use of variable backlinks, 
and backlinks for delimiter vertices is crucial (cf. Ex. |5.2[ (. 

Yet the class T is actually not closed under z± : See Fig.|7]at the 
top for a homomorphism from a non-eager-scope A-term-graph to 
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a term graph over S s . that is not a A-term-graph (as suggested by 
the overlapping scopes). But the bisimulation collapse of an eager- 
scope version of this A-term-graph is again a A-term-graph (at the 
bottom). This motivates the following theorem, which is proved in 
the extended report of 1121 . It justifies property [(P2) | with 7eag for T. 
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Figure 7. T is not closed under functional bisimulation, but 7eag is. 

Theorem 5.8. The class 7^ a g of eager-scope X-tenn- graphs is 
closed under functional bisimulation =t. 

5.3 A-term-graph semantics for A| etr ec-terms 

We will consider in fact two interpretations of A| etrec -terms as A-term- 
graphs: first we define J-]™" 1 as the composition of |-]^ and T-LT; 
then we define the semantics |-]r with more fine-grained S-sharing, 
which is necessary for defining a readback with the property |(P3) 



By composing the interpretation T-LT of A-ho-term-graphs as 
A-term-graphs with the A-ho-term-graph semantics a seman- 
tics of Aietrec-terms as A-term-graphs is obtained. There is, however, 
a more direct way to define this semantics: by using an adaptation 
of the translation rules TZ in Fig. [5] on which f-Jn is based. For this, 
let TZs be the result of replacing the rule S in TZ by the version in 
Fig. [8] While applications of this variant of the S-rule also shorten 
the abstraction-prefix, they additionally produce a delimiter vertex. 
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Figure 8. Delimiter-vertex producing version of the S-rule in Fig. [3] 

Here, at the end of the translation process, every loop on an indi- 
rection vertex with a prefix of length n is replaced by a chain of 
n S-vertices followed by a black hole vertex. Note that, while the 
system TZs inherits all of the non-determinism of 7Z, the possible 
degrees of freedom have additional impact on the result, because 
now they also determine the precise degree of S-vertex sharing. 

By analogous stipulations as in Def. |4. 11] we define the condi- 
tions under which a A-term-graph is called eager-scope TZs -genera- 
ted, or TZs -generated with minimal prefixes, from a Aietrec-term. For 
these notions, statements entirely analogous to Prop. |4.12| hold. 

Definition 5.9. The semantics J-]™" 1 for A| etrec -terms as A-term- 
graphs is defined as f-p" : Ter(Aietrec) -> 7Lg, L^fLj ' 



[£i]t 
= IL-4T = il' 3 W" 



mm 

2Jr 




3jr 



s 


/ 

S 


\ / - 




sr 

r 




s 

L> 






Figure 9. Translation of the Aietrec-terms from Ex. |5.14| with the 

semantics J-]™" 1 and [-Jt< For legibility some backlinks are merged. 



A-term-graph that is TZs -generated with minimal prefixes from a 
garbage-free version L' of L. 



For an example, see Ex. 5.14 below. In J-] r , 'min' also indi- 
cates that A-term-graphs obtained via this semantics exhibit minimal 
(in fact no) sharing (two or more incoming edges) of S-vertices. This 
is substantiated by the next proposition, in the light of the fact that 
T-LT does not create any shared S-vertices. 



Proposition 5.10. 



UToU H . 



Hence [■] j-" 1 only yields A-term-graphs without sharing of S-ver- 
tices, and therefore its image cannot be all of 7^ ag . As a consequence, 
we cannot hope to define a readb ack fu nction rb with respect to 



™' n to obtain a A-term- 
: 7eag- This is achieved 



that has the desired property (P3) because that requires that 
the image of the semantics is 7eag in its entirety. 

Therefore we modify the definition of [ 
graph semantics [-Jj- with image im([[-]7-) 
by letting the let-binding-structure of the Aietrec-term influence the 
degree of S-sharing as much as possible, while staying eager-scope. 

We say that a A-ho-term-graph Q is eager-scope TZ-generated 
with maximal prefixes from a Aietrec-term L if Q is 7?.-generated 
from L by a translation process in which in applications of the let- 
rule the prefixes are chosen maximally, but so that the eager-scope 
property of the process is not compromised. It can be shown that 
this condition fixes the prefix lengths per application of the let-rule. 

Definition 5.11. The semantics for A| etr ec-terms as A-term- 
graphs is defined as [-] r : Ter(A| e trec) -* 7ea g , L h> \L\t ■= 
the A-term-graph that is eager-scope TZs -generated with maximal 
prefixes from a garbage-free version L' of L. 

Proposition 5.12. [L]™ in ^ s \L} T holds for all A| etr ec -terms L . 



N ow du e to this, and due to Prop. |5.7| |(iii)| th e state ment of 
Thm. 4.15 can be transferred to 7~, yielding property (PI) for [-]r- 



^ n := the 



Theorem 5.13. For all X\ etrec -terms Li and L2 the following holds: 
[Li]a- = Nx- if and only if {L^t ±± [L 2 ] T . 

Example 5.14. Consider the following four Aietrec-terms: 

Lr = let / = Xz. z in Ax. Xy. let / = x in ((„ /) (/ y) ) (/ /) 

L 2 = Xx. let I = Xz.z in Xy. let f = x m {{y I) (I y)) (//) 

L 3 = Xx. Xy. let I = Xz. z, f = x in ((y /) (I y)) (/ /) 

L' 3 = Xx. let I = Xz.z in Ay. let / = x, g = / in ((y g) (gy)) (//) 
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Figure 10. Readback synthesis rules for computing a representing 
Aietrec-term from a A-term-graph. The rules for T- and A-vertices 
have variants for B is empty. For explanations, see Def. [Q1|(Rb-5)| 

The three possible fillings of the dashed area in Fig. [9] depict the 
translations [LiJt, [X2JT1 an d I^lr = I-^eiIt- The translations 
of the four terms with [■] mm are identical: 

[Lxjr = lL2p n = IL-4T = IL' 3 }T = [Li]r. 



6. Readback of A-term-graphs 

In this section we describe how from a given A-term-graph G a 
Aietrec-term L that represents G (i.e. for which [L]r = G holds) can 
be 'read back'. For this purpose we define a process based on term 
synthesis rules. It defines a readback function from A-term-graphs 
to Aietrec-terms. We illustrate this process by an example, formulate 
its most important properties, and sketch the proof of |(P3)| 

The idea underlying the definition of the readback procedure is 
the following: For a given A-term-graph G, a spanning tree T for G 
(augmented with a dedicated root node) is constructed that severs 
cycles of G at (some) recursive bindings, and at variable and S-back- 
links. Now the spanning tree T facilitates an inductive bottom-up 
(from the leafs upwards) synthesis process along T, which labels the 
edges of G (except for variable backlinks) with prefixed A| e trec-terms. 
For this process we use local rules (see Fig.|10[l that synthesise labels 
for incoming edges of a vertex from the labels of its outgoing edges. 
Eventually the readback of G is obtained as the label for the edge 
that singles out the root of term graph. 

The design of the readback rules is based on a decision about 
where let-bindings are placed in the synthesised term. Namely there 
exists some freedom for these placements, as certain kind of shifts 
of let-expressions (let-floating steps [ 14|) preserve the A-term-graph 
interpretation. Here, let-bindings will always be declared in a let-ex- 
pression that is placed as high up in the term as possible: a binding 
arising from the term synthesised for a shared vertex w is placed in 
a let-expression that is created at the enclosing A-abstraction of w 
(the leftmost vertex in the abstraction-prefix P(w) of w). 

Definition 6.1 (readback of A-term-graphs). Let G 6 T be a A-term- 
graph. The process of computing the readback of G (a Aietrec-term) 
consists of the following five steps, starting on G : 
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Figure 11. Example of the readback synthesis from a A-term-graph. 



(Rb- 1 ) Determine the abstraction-prefix function P for G by per- 
forming a traversal over G, and associate with every vertex w of 
G its abstraction-prefix P(w). 

(Rb-2) Add a new vertex on top with label T, arity 1, and empty 
abstraction prefix. Let G' be the resulting term graph, and P' its 
abstraction-prefix function. 

(Rb-3) Introduce indirection vertices to organise sharing: For every 
vertex w of G with two or more incoming non-variable-backlink 
edges, add an indirection vertex Wo, redirect the incoming 
edges of w that are not variable backlinks to u>o, and direct 
the outgoing edge from wo to w. In the resulting term graph G" 
only indirection vertices are sharecP] their names will be used. 
Extend P' to an abstraction-prefix function P" for G" so that 
every indirection vertex wo gets the prefix of its successor w. 

(Rb-4) Construct a spanning tree T" of G" by using a depth-first 
search (DFS) on G" . Note that all variable backlinks and S-back- 
links, and some of the recursive back-bindings, of G", are not 
contained in T", because they are back-edges of the DFS. 

(Rb-5) Apply the readback synthesis rules from Fig.[lO]to G" with 
respect to T" . By this a complete labelling of the edges of 
G" by prefixed A| et rec-terms is constructed. The rules define 
how the labelling for an incoming edge (on top) of a vertex w 
is synthesised under the assumption of an already determined 
labelling of an outgoing edge of (and below) w. If the outgoing 



Incoming variable backlinks are not counted as sharing here. 
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Figure 12. Modification of (two of) the translation rules in Fig.[3]for a variant definition of the A-term-graph interpretation of A| e t re c-terms. 
Here the translation of a let-expression does not directly spawn translations for the binding equations, but the in-part has to be translated first. 



edge in the rule does not carry a label, then the labelling of the 
incoming edge can happen regardless. Note that in these rules: 

• full line (dotted line) edges indicate spanning tree (non- 
spanning tree) edges, broken line edges either of these sorts; 

• abstraction prefixes of vertices are crucial for the 0-vertex, 
and the second indirection vertex rule, where the prefixes in 
the synthesised terms are created; in the other rules the prefix 
of the assumed term is used; for indicating a correspondence 
between a term's and a vertex's abstraction prefix we denote 
by v(p) the word of vertices occurring in a term's prefix p; 

• the rule for indirection vertices with incoming non-spanning 
tree edge introduces an unfinished binding / = ? for /; un- 
finished bindings are completed in the course of the process; 

• the ©-vertex rule applies only if v(po) = v(pi); the opera- 
tion u used in the synthesised term's prefix builds the union 
per prefix variable of the pertaining bindings; if the prefixed 
terms (po ) Lo and (pi ) L\ assumed in this rule contain a yet 
unfinished binding equation / = ? and a completed equation 
/ = P at a A-variable z, then the synthesised term contains 
the completed binding / = P for / at z ; 

• not depicted in Fig.fTolare variants of T- and A-vertices rules 
for the cases with empty B: then no let-binding is introduced 
in the synthesised term, but the term from the in-part is used. 

If this process yields the label (*[])L for the (root-)edge pointing 
to the new top vertex of G", where L is a Aietrec-term, then we call 
L the readback of G. 

Note that firing of the rules in step (Rb-5) of the readback process 
proceeds in bottom-up direction in the spanning tree, starting from 
the back-edges, with some room for parallelism concerning work in 
different subtrees. Furthermore observe that on all directed edges e 
(spanning tree edges or back edges) the rule applied to derive the 
edge label is uniquely determined by (is tied to) the label of the target 
vertex v of e, with the single exception of v being an indirection 
vertex. In that case one of the two indirection vertex rules applies, 
depending on whether e is a spanning-tree edge or a back-edge. 

Proposition 6.2. Let G be a A-term-graph. The process described 
in Def. |6.1 1 produces a complete edge labelling of the (modified) 
term graph, with label (*[])-£/ for the topmost edge, where L is a 
Aietrec-term. Hence it yields L as the readback of G. Thus Def. |6.1| 
defines a function rb : T -* Ter (Aietrec), the readback function. 

Example 6.3. See Fig.[TT]for the illustration of the synthesis of the 
readback from an example A-term-graph. Full line edges are in the 
spanning tree, dotted line edges are not. Note that at the top vertex, 
no empty let-binding is created since the variant of the T-vertex rule 
for empty binding groups is applied. 

The following theorem validates property |(P3)| with 7^ a g for T. 



Theorem 6.4. For all G 6 % dg : ([-Jr ° rb)(G) = [rb(G)] r - G 
i.e., rb is a right-inverse o/[-]tj an d [-]t a left-inverse of rb, up to 
Hence rb is injective, and f-Jj- is surjective, thus Wi(|-]t-) = 7^ ag . 

Sketch of Proof. Graph translation steps can be linked with corre- 
sponding readback steps in order to establish that the former roughly 
reverse the latter. Roughly, because e.g. reversing a A-readback step 
necessitates both a A- and a let-translation step. (For illustrations 
of the stepwise reversal of readback steps through translation steps 
we refer to two figures in the extended version of this paper 1151 .) 
However, this correspondence holds only for a modification of the 
translation rules 1Z$ from Fig. [5] Fig. [8] where the rules let (for 
let-expressions) and / (for occurrences of recursion variables) are 
replaced by the locally-operating versions in Fig. 1 1 2[ and a rule 
for creating a top vertex is added. Now the translation of a let-ex- 
pression does no longer directly spawn translations of the defined 
recursive bindings, but the bindings will only be translated later once 
their calls have been reached during the translation process of the in- 
part, or of the definitions of other already translated bindings. Note 
that in the let-rule in Fig.[l2]function bindings are associated with 
the rightmost variable in the prefix, which corresponds to choosing 
li = n in the let-rule in Fig. [5] While such a stipulation does not 
guarantee the eager-scope translation of every term, it actually does 
so for all Aietrec-terms that are obtained by the readback (on these 
terms the translation so defined coincides with f-Jj- from Def. 4.13| . 

The proof uses induction on access paths, and an invariant that 
relates the eager-scope property localised for a vertex v with the 
applicability of the S-rule to the readback term synthesised at v. □ 

7. Complexity analysis 

Here we report on a complexity analysis for the individual operations 
from the previous sections, for the used standard algorithms, and 
overall, for compactification and unfolding equivalence. 

In the lemma below, |(ii)| and |(v)| j us tify the property |(P4)| of our 
methods. Items |(iii)] and |(iv)| detail the complexity of standard meth- 
ods when used for computing bisimulation collapse and bisimilarity 
of A-term-graphs. Note that first-order term graphs can be modelled 
by deterministic process graphs, and hence by DFAs. Therefore 
bisimilarity of term graphs can be computed via language equiva- 
lence of corresponding DFAs 1 19 1 (in time 0(na(n)) 1251 . where a 
is the quasi-constant inverse Ackermann function), and bisimulation 
collapse via state minimisation of DFAs (in time 0(n log n)) 1181 . 

Lemma 7.1. (i) size([L] r ) € 0(\L\ 2 ) for L e Ter(A| etrec ). 
(ii) Translating L e Ter(A[ etre c) into \L\t £ T takes time 0(\L\ 2 ). 
(Hi) Collapsing G eT to G|| is in 0(size(G) logsize(G)). 
(iv) Deciding bisimilarity o/Gi, G2 £ T requires time 0(na(n)) 

for n = max{size(Gi), size(G2)}. 
(vj Computing the readback rb(G) for a given G eT requires time 
0(n log n), for n = size(G). 
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Based on this lemma, and on further considerations, we obtain 
the following complexity statements for our methods. 

Theorem 7.2. ( i) The computation, for a \\ e trec-term L with \L\ = 
n, of a maximally compactified form (rbo |^ o |[-]r)(i) requires 
time 0{n 2 logn). By using an S-unsharing operation unshs, a 
(typically smaller) \\ etrec -term (rb o unshs 0 \i 0 [-]r)(i) of 
size O(nlogn) can be obtained, with the same time complexity. 

(ii) The decision of whether \) etrec -terms L\ and L2 are unfolding- 
equivalent requires time 0(n 2 a(n)) for n = max{|Li|, |-£/2|}- 

8. Implementation 

We have implemented our methods in Haskell using the Utrecht Uni- 
versity Attribute Grammar System The implementation is available 
at http : / /hackage . haskell . org/package/maxsharing/ The 
output produced for the examples in this paper, and explanations can 
be found in the appendix of the extended version [ 15 1 of this paper. 

9. Modifications, extensions and applications 

We have described an adaptation of the bisimulation proof method 
for Aietrec-terms. Recognising unfolding equivalence and increasing 
sharing are reduced to problems involving first-order term graphs. 
The principal idea is to use the nested scope structure of higher-order 
terms for an interpretation by term graphs with scope delimiters. 

We conclude by describing easy modifications, rather direct ex- 
tensions, and finally, promising areas of application for our methods. 

9.1 Modifications 

Implicit sharing of X-variables. Multiple occurrences of the same 
A-variable in a A| e trec-term L are not shared (represented by a shared 
variable vertex) in the graph interpretation [L]-h. Consequently, 
our method compactifies the term Xx.xx into Ax. let / = x in / /. 
Such explicit sharing of variables is excessive for many applications. 
It can be remedied easily, namely by unsharing variable vertices 
before applying the readback, or by preventing the readback from 
introducing let-bindings when only a variable vertex is shared. 
Avoiding aliases produced by the readback. The readback function 
in Section [6] is sensitive to the degree of sharing of S-vertices in 
the given A-term-graph: it maps two A-term-graphs that only differ 
in what concerns sharing of S-vertices to different Aietrec-terms. 
Typically, for A-term-graphs with maximal sharing of S-vertices this 
can produce let-bindi ngs th at are just 'aliases', such as g is alias 
for I in L3 from Ex. 5.14 This can be avoided in two ways: by 
slightly adapting the readback function, or by performing maximal 
unsharing of S-vertices before applying the readback as defined. 
Preventing disadvantageous sharing. Introducing sharing at compile- 
time can cause 'space leaks', i.e. a needlessly high memory footprint, 
at run-time, because 'a large data structure becomes shared [...], and 
therefore its space which before was reclaimed by garbage collection 
now cannot be reclaimed until its last reference is used' |9|. For this 
reason, realisations of CSE |6| restrict the locally operating rewrite 
rules employed for introducing sharing by suitable conditions that 
account for the type of potentially shared subexpressions, and their 
strictness in the program. For our global method of introducing 
sharing via the bisimulation collapse, a different approach is needed. 

Here the bisimulation collapse can be restricted so that sharing is 
not introduced at vertices that should not be shared. More precisely, 
it can be prevented that any unshared vertex (in-degree one) from a 
pre-determined set of 'sharing-unfit' vertices would have a shared 
vertex (in-degree greater than one) as its image in the bisimulation 
collapse. This can be achieved by modifying the graph interpretation 
[•Jt- Any set of sharing-unfit positions in L gives rise to a set of 
sharing-unfit vertices in [LJt- In the modification of \L\t , special 
back-links are added from every sharing-unfit vertex with in-degree 



one to its immediate successor. These back-links prevent that such a 
sharing-unfit vertex v can collapse with another vertex v' without 
that also the predecessors of v and v would collapse as well. 



A more general notion of readback. Condi tion |(P3)| is rather rigorous 
in that it imposes a sharing structure on A| etr ec that is specific to 
A-term-graphs (degrees of S-sharing). For a weaker version of |(P3)| 
with ±± s in place of isomorphism, a readback does not have to be 
injective, and, independently of how much S-sharing a translation 
into A-term-graphs introduces, a readback function always exists. 

9.2 Extensions 

Full functional languages. In order to support programming lan- 
guages that are based on Aietrec like Haskell, additional language 
constructs need to be supported. Such languages can typically be 
desugared into a core language, which comprises only a small subset 
of language constructs such as constructors, case statements, and 
primitives. These constructs can be represented in an extension of 
Aietrec by additional function symbols. In conjunction with a desug- 
arer our methods are applicable to full programming languages. 
Other programming languages, and calculi with binding constructs. 
Most programming languages feature constructs for grouping defini- 
tions that are similar to letrec. We therefore expect that our methods 
can be adapted to many imperative languages in particular, and may 
turn out to be fruitful for optimising compilers. Our methods for 
achieving maximal sharing certainly generalise to theoretical frame- 
works, and calculi with binding constructs, such as the 7r-calculus 
1231 , and higher-order rewrite systems (e.g. CRSs and HRSs, 1 30 1 ) 
as used here for the formalisation of A| etrec . 

Fully-lazy lambda-lifting. There is a close connection between our 
methods and fully-lazy lambda-lifting [20 28 1. In particular, the 
required-variable and scope analysis of a A| etrec -term L on which the 
A-term-graph-translation \L\ 7- is based is closely analogous to the 
one needed for extracting from L the supercombinators in the result 
L of fully-lazy lambda-lifting L. Moreover, the fully-lazy lambda- 
lifting transformation can even be implemented in a natural way on 
the basis of our methods. Namely as the composition rbiz, o J-] r 
of the translation [■] 7- into A-term-graphs, where rbiz, is a variant 
readback function that, for a given A-term-graph, synthesises the 
system L of supercombinators, instead of the A| e trec-term rb(L). 
Maximal sharing on supercombinator translations of X\ etK c-terms. 
Aietrec-terms L correspond to supercombinator systems L, the result 
of fully-lazy lambda-lifting L: the combinators in L correspond to 
'extended scopes' 1 1 1 1 (or 'skeletons' [2]) in L, and supercombinator 
reduction steps on L correspond to weak /3-reduction steps L. In the 
case of A-calculus this has been established by Balabonski |2|. Via 
this correspondence the maximal- sharing method for Aietrec-terms 
can be lifted to obtain a maximal- sharing method systems of 
supercombinators obtained by fully-lazy lambda-lifting. 
Non-eager scope-closure strategies. We focused on eager-scope 
translations, because they facilitate maximal sharing, and guarantee 
that interpretations of unfolding-equivalent A| e trec-terms are bisimi- 
lar. Yet every scope-closure strategy 1111 induces a translation and 
its own notion of maximal sharing. For adapting our maximal shar- 
ing method it is necessary to modify the translation into first-order 
term graphs in such a way that the image class obtained is closed 
under homomorphism (T is not closed under z±, unlike its subclass 
7eag). This can be achieved by using delimiter vertices also below 
variable vertices to close scopes that are still open 1 12, report]. 
Weaker notions of sharing. The presented methods deal with sharing 
as expressed by letrec that is horizontal, vertical, or twisted |4|. By 
contrast, the construct /j, l4l l!3l expresses only vertical, and the non- 
recursive let only horizontal, sharing. By restricting bisimulation, 
our methods can be adapted to the A-calculus with fi, or with let. 
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Nested term graphs. The nested scope structure of a Aietrec-term 
can also be represented by a nested structure of term graphs. The 
representation of a Ai e trec-term as a 'nested term graph' [ 16] starts 
with an ordinary term graph in which some of the vertices are 
labelled by 'nested' symbols that designate outermost bindings 
together with their scope. Any such vertex is additionally associated 
with a usual term graph that specifies the subterm context describing 
the scope, where any inner scopes are again expressed by nested 
symbols. The association between nested symbols and their term 
graph specifications is required to be tree-like. The implementation 
result developed here can be generalised to show that nested term 
graphs can be implemented faithfully by first-order term graphs 1 16 1. 

9.3 Applications 

Maximal sharing at run-time. Maximal sharing can be applied re- 
peatedly at run-time in order to regain a maximally shared form, 
thereby speeding up evaluation. This is reminiscent of 'collapsed 
tree rewriting' [ 29 1 for evaluating first-order term graphs represented 
as maximally shared dags. Since the state of a program in the mem- 
ory at run-time is typically represented as a supercombinator graph, 
compactification by bisimulation collapse can take place directly on 
that graph (see Sec. |9.2[ l, no translation is needed. Compactification 
can be coupled with garbage collection as bisimulation collapse 
subsumes some of the work required for a mark and sweep garbage 
collector. However, a compromise needs to be found between the 
costs for the optimisation and the gained efficiency. 
Additional prevention of disadvantageous sharing. While static ana- 
lysis methods for preventing sharing that may be disadvantageous at 
run-time can be adapted from CSE to the maximal-sharing method 
(see Sec. |9. 1 1 >, this has yet to be investigated for binding-time analy- 
sis (27 1 and a sharing analysis of partial applications f TOl . 
Code improvement. In programming it is generally desirable to 
avoid duplication of code. As an extension of CSE, our method 
is able to detect code duplication. The bisimulation collapse of 
the term graph interpretation of a program can, together with the 
readback, provide guidance on how code can be refactored into 
a more compact form. This application requires some fine-tuning 
to avoid excessive behaviour like the explicit sharing of variable 
occurrences (see Sec. |9.1[ l. Yet for this only lightweight additional 
machinery is needed, such as size constraints or annotations to 
restrict the bisimulation collapse. 

Function equivalence. Recognising whether two programs imple- 
ment the same function is undecidable. Still, this problem is tackled 
by proof assistants, and by automated theorem provers used in 
type-checkers of compilers for dependently-typed programming 
languages such as Agda. For such systems co-inductive proofs are 
more difficult to find than inductive ones, and require more effort by 
the user. Our method for deciding unfolding equivalence could help 
to develop new approaches to finding co-inductive proofs. 
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